A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining

نویسندگان

  • Mohammad Ali Kadampur
  • Durvasula V. L. N. Somayajulu
چکیده

Data mining deals with automatic extraction of previously unknown patterns from large amounts of data. Organizations all over the world handle large amounts of data and are dependent on mining gigantic data sets for expansion of their enterprises. These data sets typically contain sensitive individual information, which consequently get exposed to the other parties. Though we cannot deny the benefits of knowledge discovery that comes through data mining, we should also ensure that data privacy is maintained in the event of data mining. Privacy preserving data mining is a specialized activity in which the data privacy is ensured during data mining. Data privacy is as important as the extracted knowledge and efforts that guarantee data privacy during data mining are encouraged. In this paper we propose a strategy that protects the data privacy during decision tree analysis of data mining process. We propose to add specific noise to the numeric attributes after exploring the decision tree of the original data. The obfuscated data then is presented to the second party for decision tree analysis. The decision tree obtained on the original data and the obfuscated data are similar but by using our method the data proper is not revealed to the second party during the mining process and hence the privacy will be preserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy-Preserving Decision Tree Mining Using A Random Replacement Perturbation

Privacy-preserving data mining has become an important topic, and many methods have been proposed for a diverse set of privacy-preserving data mining tasks. However, privacy-preserving decision tree mining pioneered by [1] still remains to be elusive. Indeed, the work of [1] was recently showed to be awed [2], meaning that an adversary can actually recover the original data from the perturbed o...

متن کامل

An Improvement of Privacy-Preserving Scheme Based on Random Substitutions

Data perturbation techniques are one of the most popular models for privacy-preserving data mining due to their practical utility [1]. In a typical data perturbation, before the data owner publishes the data, they randomly change the data in certain way to disguise the private information while preserving some statistical properties for obtaining meaningful data mining models. Agrawal and Harit...

متن کامل

Privacy-Preserving Imputation of Missing

Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning will be required. In this paper, we address the problem of privacy-preserving data i...

متن کامل

Privacy - preserving imputation of missing data q

Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning are required. In this paper, we address the problem of privacy-preserving data imput...

متن کامل

Privacy Preserving Data Mining using Random Decision Tree

Data processing with information privacy and information utility has been emerged to manage distributed information expeditiously. In this paper, to deal with this advancement in privacy protective data processing technology victimization intensify approach of Random Decision Tree (RDT). Random Decision Tree provides higher potency and information privacy than Privacy secured Data mining Techni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1001.3504  شماره 

صفحات  -

تاریخ انتشار 2010